{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0dcf3ddb",
   "metadata": {},
   "source": [
    "# Fine-Tuning Llama 2 using Low-Rank Adaptation (LoRA)\n",
    "Llama 2 has gained traction as a robust, powerful family of Large Language Models that can provide compelling responses on a wide range of tasks. While the base 7B, 13B, and 70B models serve as a strong baseline for multiple downstream tasks, they can lack in domain-specific knowledge or proprietary or otherwise sensitive information. Fine-tuning is often used as a means to update a model for a specific task or tasks to better respond to domain-specific prompts. This notebook walks through downloading the Llama 2-7B model from Hugging Face, preparing a dataset, and using Low Rank Adaptation (LoRA) to fine-tune the base model against the dataset.\n",
    "\n",
    "The implementation of LoRA is based on the paper, [LoRA: Low-Rank Adaptation of Large Language Models](https://openreview.net/pdf?id=nZeVKeeFYf9) by Hu et al.\n",
    "\n",
    "**IMPORTANT:** Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components. You are responsible for reviewing and confirming compliance with third-party component license terms and requirements.\n",
    "\n",
    "## Model Preparation\n",
    "The model needs to be downloaded and converted prior to being fine-tuned. The following blocks walk through this process.\n",
    "\n",
    "### Downloading the model\n",
    "First, the 7B variant of Llama 2 needs to be downloaded from Meta. Llama 2 is an open model that is available for commercial use. To download the model, [submit a request on Meta's portal](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) for access to all models in the Llama family. Please note that your Hugging Face account email address MUST match the email you provide on the Meta website, or your request will not be approved.\n",
    "\n",
    "Once approved, use your Hugging Face username and API key to download Llama2 7B (non-chat version) to your workstation where you will be fine-tuning the model. To pull the model files to your local machine, you may navigate on your local machine to the folder you specified as the mount for ```/project/models``` and use a ```git lfs clone https://huggingface.co/<namespace>/<repo-name>``` call to [Meta's HF repository](https://huggingface.co/meta-llama/Llama-2-7b-hf). \n",
    "\n",
    "Once you have the repository cloned locally, you can double check that the model is in the correct location if you can see ```/models/Llama-2-7b-hf``` show up on the left hand side panel of this jupyterlab. Ensure each of the following files are present: \n",
    "\n",
    "* config.json\n",
    "\n",
    "* generation_config.json\n",
    "\n",
    "* pytorch_model-00001-of-00002.bin\n",
    "\n",
    "* pytorch_model-00002-of-00002.bin\n",
    "\n",
    "* pytorch_model.bin.index.json\n",
    "\n",
    "* special_tokens_map.json\n",
    "\n",
    "* tokenizer_config.json\n",
    "\n",
    "* tokenizer.json\n",
    "\n",
    "* tokenizer.model\n",
    "\n",
    "### Converting to .nemo format\n",
    "Prior to fine-tuning the model, it needs to be converted to the .nemo format used by the NeMo Toolkit, which is a set of tools and configs comprising a toolkit for working with conversational AI. The NeMo Framework training container includes a script to convert Llama models to the .nemo format. **The following cell assumes the model has been saved to /models/Llama-2-7b-hf inside the container.**\n",
    "\n",
    "After conversion, a `llama-7b.nemo` file should be present in `/models`. This file will be used for fine-tuning for the remainder of the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d22dd95",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip3 install --upgrade ipywidgets\n",
    "!python /opt/NeMo/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py --in-file=../models/Llama-2-7b-hf/ --out-file=../models/llama-7b.nemo\n",
    "!ls -la ../models/llama-7b.nemo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09ba84d7",
   "metadata": {},
   "source": [
    "## Preparing The Dataset\n",
    "We will be using LoRA to teach our model to do Extractive Question Answering. The dataset being used for fine-tuning needs to be converted to a .jsonl file and follow a specific format. In general, question and answer datasets are easiest to work with by providing context (if applicable), a question, and the expected answer, though different downstream tasks work as well.\n",
    "\n",
    "### Downloading the dataset\n",
    "We will be using the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) reading comprehension dataset, consisting of questions posed by crowd workers on a set of Wikipedia articles, where the answer to every question is a segment of text. More information on [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) can be found on their website or in their paper by Rajpurkar et. al \"[Know What You Don’t Know: Unanswerable Questions for SQuAD](https://arxiv.org/pdf/1806.03822.pdf)\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb43a17c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "from omegaconf import OmegaConf\n",
    "import os\n",
    "import wget\n",
    "import sys\n",
    "os.environ['OPENBLAS_NUM_THREADS'] = '8'\n",
    "\n",
    "BRANCH='main'\n",
    "DATA_DIR = \"/project/data\"\n",
    "NEMO_DIR = \"/project/code\"\n",
    "\n",
    "os.makedirs(DATA_DIR, exist_ok=True)\n",
    "SQUAD_DIR = os.path.join(DATA_DIR, \"SQuAD\")\n",
    "os.makedirs(SQUAD_DIR, exist_ok=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f41069b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the SQuAD dataset\n",
    "!wget -nc https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json\n",
    "!wget -nc https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json\n",
    "!mv train-v1.1.json {SQUAD_DIR}\n",
    "!mv dev-v1.1.json {SQUAD_DIR}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "087fe26a",
   "metadata": {},
   "source": [
    "### Preprocessing the dataset\n",
    "Datasets often need some form of preprocessing to convert it into a form ready for fine-tuning. LoRA (and all PEFT tuning) models expect at least two fields in the jsonl files. The `input` field should contain all the tokens necessary for the model to generate the `output`. For example for extractive QA, the `input` should contain the context text as well as the question.\n",
    "\n",
    "```\n",
    "[\n",
    "    {\"input\": \"User: Context: [CONTEXT_1] Question: [QUESTION_1]\\n\\nAssistant:\", \"output\": [ANSWER_1]},\n",
    "    {\"input\": \"User: Context: [CONTEXT_2] Question: [QUESTION_2]\\n\\nAssistant:\", \"output\": [ANSWER_2]},\n",
    "    {\"input\": \"User: Context: [CONTEXT_3] Question: [QUESTION_3]\\n\\nAssistant:\", \"output\": [ANSWER_3]},\n",
    "]\n",
    "```\n",
    "Note that we use keywords in the input like `Context:`, `Question:` to separate the text representing the context and question. We also use the keyword `User:` and end each of the input with `\\n\\nAssistant:` tokens. These are recommended because NeMo's instruction-tuned models are trained with a prefix of `User:` and suffix `\\n\\nAssistant:`.\n",
    "\n",
    "The SQuAD dataset does not already reflect this, so let's go ahead and preprocess it to fit the above format. Feel free to take a look inside the `prompt_learning_squad_preprocessing.py` script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7123e11c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Preprocess squad data\n",
    "!python /opt/NeMo/scripts/dataset_processing/nlp/squad/prompt_learning_squad_preprocessing.py --sft-format --data-dir {SQUAD_DIR}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64e98b65",
   "metadata": {},
   "source": [
    "Let's split the datasets into train and validation files, and take a look at a few samples of the data to confirm the preprocessing is satisfactory. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85af900f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# What the squad dataset looks like after processing\n",
    "! head -5000 $SQUAD_DIR/squad_train.jsonl > $SQUAD_DIR/squad_short_train.jsonl\n",
    "! head -500 $SQUAD_DIR/squad_val.jsonl > $SQUAD_DIR/squad_short_val.jsonl\n",
    "! head -4 $SQUAD_DIR/squad_short_val.jsonl\n",
    "! head -4 $SQUAD_DIR/squad_short_train.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "320e68e7",
   "metadata": {},
   "source": [
    "This format is used for both the `train` and `val` files. These files will be used as datasets for the fine-tuning process. The dataset is now ready to be used for fine-tuning the model!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23ecb18b",
   "metadata": {},
   "source": [
    "## Configuring the job\n",
    "With the dataset preparation finished, we need to update the default configuration for our fine-tuning job. The sample config file provided by NeMo is a good template to base our changes on. Let's load the file as an object that we can edit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d96abcb",
   "metadata": {},
   "outputs": [],
   "source": [
    "config = OmegaConf.load(\"/opt/NeMo/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_tuning_config.yaml\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f868ca57",
   "metadata": {},
   "source": [
    "With the config loaded, we can override certain settings for our environment. Many of the default values shown here would work but some key points are called out below:\n",
    "\n",
    "* `config.trainer.precision=32` - This is the precision that will be used during fine-tuning. The model might be more accurate with higher values but it also uses more memory than lower precisions. If the fine-tuning process runs out of memory, try reducing the precision here.\n",
    "* `config.trainer.devices=1` - This is the number of devices that will be used. If running on a multi-GPU system, increase this number as appropriate.\n",
    "* `config.model.restore_from_path=\"/models/llama-7b.nemo\"` - This is the path to the converted `.nemo` checkpoint from the beginning of the notebook. If the path changed, update it here.\n",
    "* `config.model.data.train_ds.file_names` and `config.model.data.validation_ds.file_names` - If a different filename or path was used for the dataset files created earlier, specify the new values here.\n",
    "* `config.model.global_batch_size` - If using a higher GPU count or if additional GPU memory allows, this value can be increased for higher performance. Note that higher batch sizes use more GPU memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf41eeae",
   "metadata": {},
   "outputs": [],
   "source": [
    "config.trainer.precision = 32\n",
    "config.trainer.devices = 1\n",
    "config.trainer.max_epochs = 4\n",
    "config.trainer.val_check_interval = 1.0\n",
    "config.trainer.max_steps=1000\n",
    "config.trainer.val_check_interval=50\n",
    "\n",
    "config.model.peft.peft_scheme=\"lora\"  # we can also set this to adapter or ptuning or ia3\n",
    "config.model.data.train_ds.file_names = [f\"{SQUAD_DIR}/squad_short_train.jsonl\"]\n",
    "config.model.data.train_ds.concat_sampling_probabilities=[1.0]\n",
    "config.model.data.validation_ds.file_names = [f\"{SQUAD_DIR}/squad_short_val.jsonl\"]\n",
    "config.model.data.validation_ds.names=[\"squad_val\"]\n",
    "config.model.data.train_ds.prompt_template =\"{input} {output}\"\n",
    "\n",
    "config.model.restore_from_path=\"/project/models/llama-7b.nemo\"\n",
    "config.exp_manager.exp_dir=f\"{NEMO_DIR}/peft_lora\"\n",
    "config.exp_manager.explicit_log_dir=\"training_info\"\n",
    "config.model.micro_batch_size=1\n",
    "config.model.global_batch_size=4\n",
    "config.model.data.train_ds.num_workers=0  # 0 is recommended which just uses the main thread to process training examples\n",
    "config.model.data.validation_ds.num_workers=0 # 0 is recommended which just uses the main thread to process the validation examples\n",
    "\n",
    "os.environ[\"LOCAL_RANK\"] = '0'\n",
    "os.environ[\"RANK\"] = '0'\n",
    "os.environ[\"WORLD_SIZE\"] = '1'\n",
    "\n",
    "# Final model config\n",
    "print(OmegaConf.to_yaml(config.model))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f19bc817",
   "metadata": {},
   "source": [
    "With the config settings updated, save it as a `.yaml` file that can be read by NeMo Toolkit during fine-tuning and save it to the fine-tuning configuration directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4323eb91",
   "metadata": {},
   "outputs": [],
   "source": [
    "# OmegaConf.save can also accept a `str` or `pathlib.Path` instance:\n",
    "OmegaConf.save(config, \"llama-config.yaml\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e38de872",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mv llama-config.yaml /opt/NeMo/examples/nlp/language_modeling/tuning/conf/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2048391b",
   "metadata": {},
   "source": [
    "## Launching the job\n",
    "With the model downloaded, the dataset prepped, and the config set, it is now time to launch the fine-tuning job! The following block launches the job on the specified number of GPUs. Depending on the size of the dataset and the GPU used, this could take anywhere from a few minutes to several hours to finish. As the model is tuned, checkpoints will be saved in the `training_info` directory inside the container. These checkpoints contain prompt embeddings which are used to send inference requests with the fine-tuned weights to deployed models so they respond as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cea7f2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py \\\n",
    "    --config-name=llama-config.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "634786d5",
   "metadata": {},
   "source": [
    "### Configuring the evaluation script\n",
    "The configuration file for the evaluation job needs to be updated based on the provided template to reflect the changes for the experiment here. Load the template config and update some of the settings to match the local environment. Note the following settings may differ for custom datasets:\n",
    "* `config.model.data.test_ds.file_names` - List any prediction files that should be used to evaluate the model. In general, it is recommended to have this be different from the training and validation files used during fine-tuning. For the sake of simplicity, we test on the validation set. You may generate your own test set if you wish. \n",
    "* `config.model.data.test_ds.names` - Change the name of the dataset used here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8bf017d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the template config file\n",
    "config = OmegaConf.load(\"/opt/NeMo/examples/nlp/language_modeling/tuning/conf/megatron_gpt_peft_eval_config.yaml\")\n",
    "\n",
    "# Override required settings\n",
    "config.model.restore_from_path=\"/project/models/llama-7b.nemo\"\n",
    "config.model.peft.restore_from_path=\"/project/code/training_info/checkpoints/megatron_gpt_peft_tuning.nemo\"\n",
    "config.model.data.test_ds.file_names=[f\"{SQUAD_DIR}/squad_short_val.jsonl\"]\n",
    "config.model.data.test_ds.names=[\"squad\"]\n",
    "config.model.data.test_ds.global_batch_size=1\n",
    "config.model.data.test_ds.micro_batch_size=1\n",
    "config.model.data.test_ds.tokens_to_generate=30\n",
    "config.model.data.test_ds.write_predictions_to_file=True\n",
    "config.model.data.test_ds.output_file_path_prefix=\"/project/code/predictions\"\n",
    "config.inference.greedy=True\n",
    "\n",
    "# Save the new config file\n",
    "OmegaConf.save(config, \"llama-eval-config.yaml\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6335b33a",
   "metadata": {},
   "source": [
    "Once the config is saved, evaluation can be launched below. Depending on the size of the hardware and the number of inference examples, this may take a few minutes to complete. Results will be saved to `code/predictions_test_squad_inputs_preds_labels.jsonl`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57ce48e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mv llama-eval-config.yaml /opt/NeMo/examples/nlp/language_modeling/tuning/conf/\n",
    "!python /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_peft_eval.py --config-name=llama-eval-config.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1ba8d00",
   "metadata": {},
   "source": [
    "Note depending on hardware and the number of examples used, the evaluation script may take a while to run since we are using a training container setting and not currently optimizing for inference. Once we are ready to serve the finetuned model for true deployment, we may then move the model to an optimized inference framework like Triton and/or TensorRT-LLM. \n",
    "\n",
    "After the evaluation script completes, view the results. Keep in mind the results you see may vary in quality for a variety of reasons: \n",
    "* The 7B parameter version of Llama 2 is used in this workflow. 13B and 70B may generate higher quality responses. \n",
    "* Hyperparameters can be adjusted in the config files to improve performance. \n",
    "* More or better performing hardware can enable more efficient fine-tuning for more epochs. \n",
    "\n",
    "The point is fine tuning the out-of-the-box model to the general QA task seems to be easy and straightforward with this workflow; just imagine what 13B and 70B results will look like!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c00f301",
   "metadata": {},
   "outputs": [],
   "source": [
    "!head -n 5 /project/code/predictions_test_squad_inputs_preds_labels.jsonl "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba7f26fa",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
